Overview

Dataset Statistics

Number of Variables 15
Number of Rows 48842
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 52
Duplicate Rows (%) 0.1%
Total Size in Memory 30.2 MB
Average Row Size in Memory 649.3 B
Variable Types
  • Numerical: 6
  • Categorical: 9

Dataset Insights

fnlwgt is skewed Skewed
education-num is skewed Skewed
capital-gain is skewed Skewed
capital-loss is skewed Skewed
hours-per-week is skewed Skewed
capital-gain has 44807 (91.74%) zeros Zeros
capital-loss has 46560 (95.33%) zeros Zeros

Variables

age

numerical

Approximate Distinct Count 74
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 763.2 KB
Mean 38.6436
Minimum 17
Maximum 90
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • age is skewed right (γ1 = 0.5576)

Quantile Statistics

Minimum 17
5-th Percentile 19
Q1 28
Median 37
Q3 48
95-th Percentile 63
Maximum 90
Range 73
IQR 20

Descriptive Statistics

Mean 38.6436
Standard Deviation 13.7105
Variance 187.9781
Sum 1.8874e+06
Skewness 0.5576
Kurtosis -0.1844
Coefficient of Variation 0.3548
  • age has 216 outliers

workclass

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.4 MB
  • The largest value ( Private) is over 8.78 times larger than the second largest value ( Self-emp-not-inc)

Length

Mean 8.8709
Standard Deviation 3.0785
Median 8
Minimum 2
Maximum 17

Sample

1st row Private
2nd row Private
3rd row Local-gov
4th row Private
5th row ?

Letter

Count 360074
Lowercase Letter 314031
Space Separator 48842
Uppercase Letter 46043
Dash Punctuation 21556
Decimal Number 0
  • The top 2 categories ( Private, Self-emp-not-inc) take over 50.0%
  • The largest value (private) is over 6.1 times larger than the second largest value (selfempinc)

fnlwgt

numerical

Approximate Distinct Count 28523
Approximate Unique (%) 58.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 763.2 KB
Mean 189664.1346
Minimum 12285
Maximum 1.4904e+06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • fnlwgt is skewed right (γ1 = 1.4388)

Quantile Statistics

Minimum 12285
5-th Percentile 39615.4
Q1 117550.5
Median 178144.5
Q3 237642
95-th Percentile 379481.65
Maximum 1.4904e+06
Range 1.4781e+06
IQR 120091.5

Descriptive Statistics

Mean 189664.1346
Standard Deviation 105604.0254
Variance 1.1152e+10
Sum 9.2636e+09
Skewness 1.4388
Kurtosis 6.0571
Coefficient of Variation 0.5568
  • fnlwgt is not normally distributed (p-value 8.836658140702256e-08)
  • fnlwgt has 1453 outliers

education

categorical

Approximate Distinct Count 16
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.5 MB

Length

Mean 9.4221
Standard Deviation 2.4401
Median 8
Minimum 4
Maximum 13

Sample

1st row 11th
2nd row HS-grad
3rd row Assoc-acdm
4th row Some-college
5th row Some-college

Letter

Count 366588
Lowercase Letter 308287
Space Separator 48842
Uppercase Letter 58301
Dash Punctuation 32869
Decimal Number 11894
  • The top 2 categories ( HS-grad, Some-college) take over 50.0%

education-num

numerical

Approximate Distinct Count 16
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 763.2 KB
Mean 10.0781
Minimum 1
Maximum 16
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • education-num is skewed left (γ1 = -0.3165)

Quantile Statistics

Minimum 1
5-th Percentile 5
Q1 9
Median 10
Q3 12
95-th Percentile 14
Maximum 16
Range 15
IQR 3

Descriptive Statistics

Mean 10.0781
Standard Deviation 2.571
Variance 6.6099
Sum 492234
Skewness -0.3165
Kurtosis 0.6256
Coefficient of Variation 0.2551
  • education-num is not normally distributed (p-value 3.284727489716024e-16)
  • education-num has 1794 outliers

marital-status

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.7 MB

Length

Mean 15.406
Standard Deviation 3.9151
Median 14
Minimum 8
Maximum 22

Sample

1st row Never-married
2nd row Married-civ-spous...
3rd row Married-civ-spous...
4th row Married-civ-spous...
5th row Never-married

Letter

Count 641415
Lowercase Letter 592499
Space Separator 48842
Uppercase Letter 48916
Dash Punctuation 62205
Decimal Number 0
  • The top 2 categories ( Married-civ-spouse, Never-married) take over 50.0%

occupation

categorical

Approximate Distinct Count 15
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.6 MB

Length

Mean 13.187
Standard Deviation 4.2489
Median 14
Minimum 2
Maximum 18

Sample

1st row Machine-op-inspct
2nd row Farming-fishing
3rd row Protective-serv
4th row Machine-op-inspct
5th row ?

Letter

Count 548635
Lowercase Letter 502587
Space Separator 48842
Uppercase Letter 46048
Dash Punctuation 43793
Decimal Number 0

relationship

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.5 MB
  • The largest value ( Husband) is over 1.57 times larger than the second largest value ( Not-in-family)

Length

Mean 10.1387
Standard Deviation 2.7716
Median 10
Minimum 5
Maximum 15

Sample

1st row Own-child
2nd row Husband
3rd row Husband
4th row Husband
5th row Own-child

Letter

Count 412100
Lowercase Letter 363258
Space Separator 48842
Uppercase Letter 48842
Dash Punctuation 34253
Decimal Number 0
  • The top 2 categories ( Husband, Not-in-family) take over 50.0%
  • The largest value (husband) is over 1.57 times larger than the second largest value (notfamily)

race

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.3 MB
  • The largest value ( White) is over 8.91 times larger than the second largest value ( Black)

Length

Mean 6.5294
Standard Deviation 2.5695
Median 6
Minimum 6
Maximum 19

Sample

1st row Black
2nd row White
3rd row White
4th row Black
5th row White

Letter

Count 266089
Lowercase Letter 213269
Space Separator 48842
Uppercase Letter 52820
Dash Punctuation 3978
Decimal Number 0
  • The top 2 categories ( White, Black) take over 50.0%
  • The largest value (white) is over 8.91 times larger than the second largest value (black)

sex

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.3 MB
  • The largest value ( Male) is over 2.02 times larger than the second largest value ( Female)

Length

Mean 5.663
Standard Deviation 0.9415
Median 5
Minimum 5
Maximum 7

Sample

1st row Male
2nd row Male
3rd row Male
4th row Male
5th row Female

Letter

Count 227752
Lowercase Letter 178910
Space Separator 48842
Uppercase Letter 48842
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories ( Male, Female) take over 50.0%
  • The largest value (male) is over 2.02 times larger than the second largest value (female)

capital-gain

numerical

Approximate Distinct Count 123
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 763.2 KB
Mean 1079.0676
Minimum 0
Maximum 99999
Zeros 44807
Zeros (%) 91.7%
Negatives 0
Negatives (%) 0.0%
  • capital-gain is skewed right (γ1 = 11.8943)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 5013
Maximum 99999
Range 99999
IQR 0

Descriptive Statistics

Mean 1079.0676
Standard Deviation 7452.0191
Variance 5.5533e+07
Sum 5.2704e+07
Skewness 11.8943
Kurtosis 152.6773
Coefficient of Variation 6.906
  • capital-gain is not normally distributed (p-value 4.516580682612185e-25)
  • capital-gain has 4035 outliers

capital-loss

numerical

Approximate Distinct Count 99
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 763.2 KB
Mean 87.5023
Minimum 0
Maximum 4356
Zeros 46560
Zeros (%) 95.3%
Negatives 0
Negatives (%) 0.0%
  • capital-loss is skewed right (γ1 = 4.5697)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 4356
Range 4356
IQR 0

Descriptive Statistics

Mean 87.5023
Standard Deviation 403.0046
Variance 162412.669
Sum 4.2738e+06
Skewness 4.5697
Kurtosis 20.0122
Coefficient of Variation 4.6056
  • capital-loss is not normally distributed (p-value 4.300680056551459e-25)
  • capital-loss has 2282 outliers

hours-per-week

numerical

Approximate Distinct Count 96
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 763.2 KB
Mean 40.4224
Minimum 1
Maximum 99
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • hours-per-week is skewed right (γ1 = 0.2387)

Quantile Statistics

Minimum 1
5-th Percentile 17.05
Q1 40
Median 40
Q3 45
95-th Percentile 60
Maximum 99
Range 98
IQR 5

Descriptive Statistics

Mean 40.4224
Standard Deviation 12.3914
Variance 153.5479
Sum 1.9743e+06
Skewness 0.2387
Kurtosis 2.9506
Coefficient of Variation 0.3065
  • hours-per-week is not normally distributed (p-value 1.6083929142580295e-23)
  • hours-per-week has 13496 outliers

native-country

categorical

Approximate Distinct Count 42
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 3.6 MB
  • The largest value ( United-States) is over 46.09 times larger than the second largest value ( Mexico)

Length

Mean 13.3068
Standard Deviation 2.3674
Median 14
Minimum 2
Maximum 27

Sample

1st row United-States
2nd row United-States
3rd row United-States
4th row United-States
5th row United-States

Letter

Count 555817
Lowercase Letter 463369
Space Separator 48842
Uppercase Letter 92448
Dash Punctuation 44344
Decimal Number 0
  • The top 2 categories ( United-States, Mexico) take over 50.0%
  • The largest value (unitedstates) is over 46.09 times larger than the second largest value (mexico)

class

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 3.3 MB
  • The largest value ( <=50K) is over 3.18 times larger than the second largest value ( >50K)

Length

Mean 5.7607
Standard Deviation 0.4266
Median 6
Minimum 5
Maximum 6

Sample

1st row <=50K
2nd row <=50K
3rd row >50K
4th row >50K
5th row <=50K

Letter

Count 48842
Lowercase Letter 0
Space Separator 48842
Uppercase Letter 48842
Dash Punctuation 0
Decimal Number 97684
  • The top 2 categories ( <=50K, >50K) take over 50.0%

Interactions

Correlations

Missing Values